Democratizing Access to Education Data

The Urban Institute’s Education Data Portal

Erika Tyagi

The Education Data Portal bridges the gap between data availability and data accessibility.

  1. What do I mean by the availability-accessibility gap?
  2. How does the portal bridge this gap?
  3. Why does bridging this gap matter?

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns
  • And the occasional coffee spill

What do I mean by the data
availability-accessibility gap?

Example: Collecting data on COVID in jails and prisons

  • A spreadsheet
  • Scanned as a PDF
  • With dark text on a dark background
  • And a little blurry
  • And inconsistent rows and columns
  • And the occasional coffee spill

Accessible to whom?

How does the portal bridge this gap?

  • Provides a one-stop-shop for 100+ datasets released by government agencies and other institutions on schools, school districts, and colleges in the U.S.
  • Includes harmonized data and metadata for each dataset
  • Makes it easier for users to look at trends over time and combine data from different sources

How does the portal bridge this gap?

Example: How has tuition at my alma mater changed?

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

Without the Education Data Portal…

Example: How has tuition at my alma mater changed?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice a few anomalies
  • Re-read the data documentation
  • Give up Take an ice cream break
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

This is tedious, error-prone, and simply not fun.

Using the portal R package

Example: How has tuition at my alma mater changed?

library(educationdata)

# Get data 
data <- get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = list(
    year = c(1990:2020), 
    unitid = "173258", 
    tuition_type = "4"
  )
)

# Plot data 
data %>%
  ggplot(aes(x = year, y = tuition_fees_ft)) +
  geom_line()

Using the portal Python package

Example: How has tuition at my alma mater changed?

import educationdata 

# Get data 
data = get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = {
    "year": range(1990, 2020), 
    "unitid": "173258", 
    "tuition_type": "4" 
  }
)

# Plot data 
data.plot.line(
  x = "year", y = "tuition_fees_ft"
)

Using the portal Stata package

Example: How has tuition at my alma mater changed?

* Get data 
educationdata using ///
  "college ipeds academic-year-tuition", sub( ///
  year=1990/2020 ///
  unitid=173258 ///
  tuition_type=4 ///
)

* Plot data 
twoway (line tuition_fees_ft year)







Using the portal Data Explorer

Example: How has tuition at my alma mater changed?

Why do I think the portal bridges this gap so effectively?

  1. By focusing on the underlying API
  2. By focusing on data documentation

The underlying API

  • 120+ data endpoints
    (with the data)
  • 12+ metadata endpoints (about the data)
  • All other tools, packages, and documentation are built on these endpoints

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Data documentation

  • Considered a
    first-order priority
  • For humans and machines
  • With details on demand

Why do I think the portal bridges this gap so effectively?

Why does bridging this gap matter?

Different people ask different—and important—questions.

Get in touch